Using Highly Compressed PDF

Use the Recognition component to create searchable PDFs comprised of recognized text and images. Optimal compression is chosen for each image based on OCR zone information recovered during recognition, producing smaller PDF files. Highly compressed PDFs are best suited for scanned documents.

Follow these steps to save a highly-compressed PDF:

Load the image
Recognize the image
Export recognition data to PDF
Save the PDF

Load the Image

The Loading Images section provides help on how to load an image. Alternatively, an image could be scanned into memory using the TWAIN and ISIS components.

Recognize the Image

Before an image can be used with the highly compressed PDF functionality, it must be recognized with the Recognition component. First use the IG_REC_image_import function to prepare an image for recognition. Then use IG_REC_image_recognize to generate its recognition data. More details are in the Optical Character Recognition section.

Zone data generated during recognition is used to choose optimal image compression.

Any changes to the recognized zone data—through either using manual zones or other zone options—may adversely affect the final PDF file size.

When manually zoning an image, take care to mark any picture data with the IG_REC_WT_GRAPHIC zone type.

Export Recognition Data to PDF

First create a new PDF document with IG_mpi_create and IG_PDF_doc_create.

Next, prepare an AT_REC_PDF_PAGE_OPTIONS structure to re-compress the source image and add invisible text into the PDF page:

• Set SegmentImage to TRUE.

• Set VisibleImage to TRUE.

• Set VisibleText to FALSE.

Then use the Recognition IG_REC_PDF_page_create function with each recognized page to append a new highly-compressed PDF page to the document.

Save the PDF

After the PDF document is created and pages created, use the function IG_mpi_file_save to save the PDF document to disk. Make sure to use IG_FORMAT_PDF for the nFormat parameter.